1 Introduction
1.1 Package installation
Let’s start by installing the packages we will need in the following.
if(!require(devtools)){install.packages("devtools");library(devtools)}## Le chargement a nécessité le package : devtools
## Le chargement a nécessité le package : usethis
if(!require(ggmap)){install.packages("ggmap");library(ggmap)}## Le chargement a nécessité le package : ggmap
## Le chargement a nécessité le package : ggplot2
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
1.2 Recovery of hydrolic data
The data to be studied were presented in Dione (1992).
station_data <- read.csv("https://tinyurl.com/y2kosb2y")
longlat <- read.csv("https://tinyurl.com/yy9f9bsf")The KIDIRA, GOURBA(ssi) and
FADOU(gou) columns correspond to annual flows between 1903
and 1989. The columns GOUc and KILIc
correspond to complementary hydrolic data.
The dms2dd function of the biogeo package
allows to convert coordinates known as longitude and latitude expressed
in degrees, minutes and seconds into decimal degrees.
if(!require(biogeo)){install.packages("biogeo");library(biogeo)}## Le chargement a nécessité le package : biogeo
latdd<-with(longlat,dms2dd(LATDEG,LATMIN,0,LATNS))
latdd## [1] 14.46667 12.80000 12.73333 12.26667 11.31667 10.75000 14.46667 13.40000
## [9] 12.51667
min(latdd)## [1] 10.75
max(latdd)## [1] 14.46667
longdd<-with(longlat,dms2dd(LONGDEG,LONGMIN,0,LONGOE))
min(longdd)## [1] -12.3
max(longdd)## [1] -11.01667
typesite <- (longlat$NATURE.DU.SITE=="STATIONS PLUVIOMETRIQUES")+1
df_dd_stations <- data.frame(cbind(latdd,longdd,typesite))1.3 Graphic visualization of the position of the measuring stations.
Recovery of geographical data from the Internet.
bbox = bounding box.
min(longdd)## [1] -12.3
max(longdd)## [1] -11.01667
min(latdd)## [1] 10.75
max(latdd)## [1] 14.46667
faleme <- get_stamenmap(bbox = c(left = min(longdd)*1.05, bottom = min(latdd)/1.05, right = max(longdd)/1.05, top = max(latdd)*1.05),
zoom = 8)## Source : http://tile.stamen.com/terrain/8/118/117.png
## Source : http://tile.stamen.com/terrain/8/119/117.png
## Source : http://tile.stamen.com/terrain/8/120/117.png
## Source : http://tile.stamen.com/terrain/8/118/118.png
## Source : http://tile.stamen.com/terrain/8/119/118.png
## Source : http://tile.stamen.com/terrain/8/120/118.png
## Source : http://tile.stamen.com/terrain/8/118/119.png
## Source : http://tile.stamen.com/terrain/8/119/119.png
## Source : http://tile.stamen.com/terrain/8/120/119.png
## Source : http://tile.stamen.com/terrain/8/118/120.png
## Source : http://tile.stamen.com/terrain/8/119/120.png
## Source : http://tile.stamen.com/terrain/8/120/120.png
Several types of land are available.
maptypes = c("terrain", "terrain-background", "terrain-labels",
"terrain-lines", "toner", "toner-2010", "toner-2011",
"toner-background", "toner-hybrid", "toner-labels",
"toner-lines", "toner-lite", "watercolor")Retrieve another example of a map terrain type.
faleme_toner2011 <- get_stamenmap(bbox = c(left = min(longdd)*1.05, bottom = min(latdd)/1.05, right = max(longdd)/1.05, top = max(latdd)*1.05), zoom = 8, maptype="toner-2011")## Source : http://tile.stamen.com/toner-2011/8/118/117.png
## Source : http://tile.stamen.com/toner-2011/8/119/117.png
## Source : http://tile.stamen.com/toner-2011/8/120/117.png
## Source : http://tile.stamen.com/toner-2011/8/118/118.png
## Source : http://tile.stamen.com/toner-2011/8/119/118.png
## Source : http://tile.stamen.com/toner-2011/8/120/118.png
## Source : http://tile.stamen.com/toner-2011/8/118/119.png
## Source : http://tile.stamen.com/toner-2011/8/119/119.png
## Source : http://tile.stamen.com/toner-2011/8/120/119.png
## Source : http://tile.stamen.com/toner-2011/8/118/120.png
## Source : http://tile.stamen.com/toner-2011/8/119/120.png
## Source : http://tile.stamen.com/toner-2011/8/120/120.png
Adding the stations to the two types of maps of the region.
ggmap(faleme) + geom_point(
aes(x = longdd, y = latdd, color=typesite),
data = df_dd_stations, colour = c("red","green")[typesite], size = 3
)ggmap(faleme_toner2011) + geom_point(
aes(x = longdd, y = latdd, color=typesite),
data = df_dd_stations, colour = c("red","green")[typesite], size = 3
)2 Adjustments to hydrological data
if(!require(fitdistrplus)){install.packages("fitdistrplus");library(fitdistrplus)}## Le chargement a nécessité le package : fitdistrplus
## Le chargement a nécessité le package : MASS
## Le chargement a nécessité le package : survival
2.1 KIDIRA
Représentations des données.
plotdist(station_data$KIDIRA) descdist(station_data$KIDIRA, boot = 1000)## summary statistics
## ------
## min: 60 max: 963
## median: 212
## mean: 253.1034
## estimated sd: 186.9003
## estimated skewness: 2.230276
## estimated kurtosis: 7.713073
2.1.1 Some examples of fitting distributions to data.
2.1.1.1 Log-normal
KIDIRA.ln <- fitdist(station_data$KIDIRA, "lnorm")
summary(KIDIRA.ln)## Fitting of the distribution ' lnorm ' by maximum likelihood
## Parameters :
## estimate Std. Error
## meanlog 5.3476352 0.06169314
## sdlog 0.5754353 0.04362305
## Loglikelihood: -540.6132 AIC: 1085.226 BIC: 1090.158
## Correlation matrix:
## meanlog sdlog
## meanlog 1 0
## sdlog 0 1
plot(KIDIRA.ln)2.1.1.2 Gumbel
if(!require(actuar)){install.packages("actuar");library(actuar)}## Le chargement a nécessité le package : actuar
##
## Attachement du package : 'actuar'
## Les objets suivants sont masqués depuis 'package:stats':
##
## sd, var
## L'objet suivant est masqué depuis 'package:grDevices':
##
## cm
KIDIRA.gumbel <- fitdist(station_data$KIDIRA, "gumbel", start=list(alpha=10, scale=10))
summary(KIDIRA.gumbel)## Fitting of the distribution ' gumbel ' by maximum likelihood
## Parameters :
## estimate Std. Error
## alpha 183.3946 11.244267
## scale 101.3473 9.242219
## Loglikelihood: -548.6173 AIC: 1101.235 BIC: 1106.166
## Correlation matrix:
## alpha scale
## alpha 1.0000000 0.2565498
## scale 0.2565498 1.0000000
plot(KIDIRA.gumbel)2.1.1.3 Log-logistic
KIDIRA.ll <- fitdist(station_data$KIDIRA, "llogis", start = list(shape = 1, scale = 10))
summary(KIDIRA.ll)## Fitting of the distribution ' llogis ' by maximum likelihood
## Parameters :
## estimate Std. Error
## shape 3.199725 0.2905542
## scale 203.019858 11.6813127
## Loglikelihood: -539.0256 AIC: 1082.051 BIC: 1086.983
## Correlation matrix:
## shape scale
## shape 1.0000000 -0.0292221
## scale -0.0292221 1.0000000
plot(KIDIRA.ll)2.1.1.4 Pareto
KIDIRA.P <- fitdist(station_data$KIDIRA, "pareto", start = list(shape = 1, scale = 10))## Warning in sqrt(diag(varcovar)): Production de NaN
## Warning in sqrt(1/diag(V)): Production de NaN
## Warning in cov2cor(varcovar): diag(.) à 0 ou NA entrées ; les résultats non
## finis sont douteux
summary(KIDIRA.P)## Fitting of the distribution ' pareto ' by maximum likelihood
## Parameters :
## estimate Std. Error
## shape 9085979 4194.304
## scale 2300787437 NaN
## Loglikelihood: -568.4405 AIC: 1140.881 BIC: 1145.813
## Correlation matrix:
## shape scale
## shape 1 NaN
## scale NaN 1
plot(KIDIRA.P)2.1.1.5 Burr
KIDIRA.B <- fitdist(station_data$KIDIRA, "burr", start = list(shape1 = 1, shape2 = 1))## Warning in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, : Some
## parameter names have no starting/fixed value but have a default value: rate,
## scale.
summary(KIDIRA.B)## Fitting of the distribution ' burr ' by maximum likelihood
## Parameters :
## estimate Std. Error
## shape1 0.07115278 0.1742363
## shape2 2.62656386 6.4320445
## Loglikelihood: -698.1134 AIC: 1400.227 BIC: 1405.159
## Correlation matrix:
## shape1 shape2
## shape1 1.0000000 -0.9990415
## shape2 -0.9990415 1.0000000
plot(KIDIRA.B)2.1.2 Graphical comparison of the fits
By their densities.
denscomp(lapply(paste("KIDIRA",c("ln","gumbel","ll","P","B"),sep="."),get), legendtext = c("lognormal","gumbel", "loglogistic", "Pareto", "Burr"))By quantile-quantile diagrams.
qqcomp(lapply(paste("KIDIRA",c("ln","gumbel","ll","P","B"),sep="."),get), legendtext = c("lognormal","gumbel", "loglogistic", "Pareto", "Burr"))By their distribution functions.
cdfcomp(lapply(paste("KIDIRA",c("ln","gumbel","ll","P","B"),sep="."),get), legendtext = c("lognormal","gumbel", "loglogistic", "Pareto", "Burr"))By probability-probability diagrams.
ppcomp(lapply(paste("KIDIRA",c("ln","gumbel","ll","P","B"),sep="."),get), legendtext = c("lognormal","gumbel", "loglogistic", "Pareto", "Burr"))2.1.3 Numerical comparison of fits
gofstat(lapply(paste("KIDIRA",c("ln","gumbel","ll","P","B"),sep="."),get), fitnames = c("lognormal","gumbel", "loglogistic", "Pareto", "Burr"))## Goodness-of-fit statistics
## lognormal gumbel loglogistic Pareto
## Kolmogorov-Smirnov statistic 0.1134799 0.1244879 0.07528331 0.2882154
## Cramer-von Mises statistic 0.1778413 0.2931510 0.06870337 1.7262407
## Anderson-Darling statistic 1.3924941 2.6919756 0.81675664 9.1683672
## Burr
## Kolmogorov-Smirnov statistic 0.535988
## Cramer-von Mises statistic 6.994845
## Anderson-Darling statistic 32.306880
##
## Goodness-of-fit criteria
## lognormal gumbel loglogistic Pareto Burr
## Akaike's Information Criterion 1085.226 1101.235 1082.051 1140.881 1400.227
## Bayesian Information Criterion 1090.158 1106.166 1086.983 1145.813 1405.159
2.2 Other distributions
The initial study suggests testing the fit to Galton’s (= lognormal), Goodrich’s, Pearson’s 3, Gumbel, …
2.3 Other techniques
Apply alternative distribution selection approaches to these data.
3 Applications to other variables.
plotdist(station_data$GOURBA) plotdist(station_data$FADOU) plotdist(station_data$GOUc) plotdist(station_data$KILIc) descdist(station_data$GOURBA, boot = 1000)## summary statistics
## ------
## min: 81 max: 966
## median: 186
## mean: 264.8276
## estimated sd: 221.6834
## estimated skewness: 1.860305
## estimated kurtosis: 5.103623
descdist(station_data$FADOU, boot = 1000)## summary statistics
## ------
## min: 40 max: 998
## median: 152
## mean: 345.1379
## estimated sd: 310.0986
## estimated skewness: 0.8923851
## estimated kurtosis: 2.102951
Valeurs annuelles cumulées.
descdist(station_data$GOUc, boot = 1000)## summary statistics
## ------
## min: 320 max: 2681
## median: 1449
## mean: 1452.08
## estimated sd: 517.403
## estimated skewness: 0.09928642
## estimated kurtosis: 2.506413
descdist(station_data$KILIc, boot = 1000)## summary statistics
## ------
## min: 136 max: 3351
## median: 1701
## mean: 1690.644
## estimated sd: 712.0299
## estimated skewness: 0.02646055
## estimated kurtosis: 2.52774
4 Mystery data
Which distribution(s) is/are compatible with this data?
donnees <- read.csv("https://tinyurl.com/yxn9udgt")
str(donnees)## 'data.frame': 1000 obs. of 1 variable:
## $ donnees: num 1.346 1.116 0.918 1.096 1.403 ...